Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give opposite results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys' textbook[1]; it became known as Lindley's paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper[2].
Contents |
Consider a null hypothesis H0, the result of an experiment x, and a prior distribution that favors H0 weakly. Lindley's paradox occurs when
These results can happen at the same time when the prior distribution is the sum of a sharp peak at H0 with probability p and a broad distribution with the rest of the probability 1 − p. It is a result of the prior having a sharp feature at H0 and no sharp features anywhere else.
We can illustrate Lindley's paradox with a numerical example. Let's imagine a certain city where 49,581 boys and 48,870 girls have been born over a certain time period. The observed proportion () of male births is thus 49,581/98,451 ≈ 0.5036. We are interested in testing whether the true proportion () is 0.5. That is, our null hypothesis is and the alternative is .
We have no reason to believe that the proportion of male births should be different from 0.5, so we assign prior probabilities and , the latter uniformly distributed between 0 and 1. The prior distribution is thus a mixture of point mass 0.5 and a uniform distribution . The number of male births is a binomial variable with mean and variance , where is the total number of births (98,451 in this case). Because the sample size is very large, and the observed proportion is far from 0 and 1, we can use a normal approximation for the distribution of . Because of the large sample, we can approximate the variance as . The posterior probability is
So we find that there is not enough evidence to reject .
Using the normal approximation above, the upper tail probability is
Because we are performing a two-sided test (we would have been equally surprised if we had seen 48,870 boy births, i.e. ), the p-value is , which is lower than the significance level of 5%. Therefore, we reject .
The two approaches—the Bayesian and the frequentist—are in conflict, and this is the paradox.